AITopics | multilingual topic model

Symmetric Correspondence Topic Models for Multilingual Text Analysis

Neural Information Processing SystemsMar-14-2024, 09:49:55 GMT

Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA. We experimented with two multilingual comparable datasets extracted from Wikipedia and demonstrate that SymCorrLDA is more effective than some other existing multilingual topic models.

pivot language, symcorrlda, topic model, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.04)
North America > Canada > Quebec > Montreal (0.04)
(15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Symmetric Correspondence Topic Models for Multilingual Text Analysis

Neural Information Processing SystemsApr-6-2023, 12:22:52 GMT

Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA.

multilingual text analysis, multilingual topic model, symmetric correspondence topic model, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Multilingual and Multimodal Topic Modelling with Pretrained Embeddings

Zosa, Elaine, Pivovarova, Lidia

arXiv.org Artificial IntelligenceNov-15-2022

This paper presents M3L-Contrast -- a novel multimodal multilingual (M3L) neural topic model for comparable data that maps texts from multiple languages and images into a shared topic space. Our model is trained jointly on texts and images and takes advantage of pretrained document and image embeddings to abstract the complexities between different languages and modalities. As a multilingual topic model, it produces aligned language-specific topics and as multimodal model, it infers textual representations of semantic concepts in images. We demonstrate that our model is competitive with a zero-shot topic model in predicting topic distributions for comparable multilingual data and significantly outperforms a zero-shot model in predicting topic distributions for comparable texts and images. We also show that our model performs almost as well on unaligned embeddings as it does on aligned embeddings.

large language model, natural language, topic model, (19 more...)

arXiv.org Artificial Intelligence

2211.08057

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Jordan (0.04)
South America (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.82)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)

Add feedback

Symmetric Correspondence Topic Models for Multilingual Text Analysis

Fukumasu, Kosuke, Eguchi, Koji, Xing, Eric P.

Neural Information Processing SystemsFeb-14-2020, 22:42:40 GMT

Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA.

multilingual text analysis, multilingual topic model, symmetric correspondence topic model, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Symmetric Correspondence Topic Models for Multilingual Text Analysis

Fukumasu, Kosuke, Eguchi, Koji, Xing, Eric P.

Neural Information Processing SystemsDec-31-2012

Topic modeling is a widely used approach to analyzing large text collections. A small number of multilingual topic models have recently been explored to discover latent topics among parallel or comparable documents, such as in Wikipedia. Other topic models that were originally proposed for structured data are also applicable to multilingual documents. Correspondence Latent Dirichlet Allocation (CorrLDA) is one such model; however, it requires a pivot language to be specified in advance. We propose a new topic model, Symmetric Correspondence LDA (SymCorrLDA), that incorporates a hidden variable to control a pivot language, in an extension of CorrLDA. We experimented with two multilingual comparable datasets extracted from Wikipedia and demonstrate that SymCorrLDA is more effective than some other existing multilingual topic models.

artificial intelligence, natural language, pivot language, (17 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Pennsylvania (0.46)
Asia > Japan > Honshū > Kansai (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Multilingual Topic Models for Unaligned Text

Boyd-Graber, Jordan, Blei, David

arXiv.org Machine LearningMay-9-2012

We develop the multilingual topic model for unaligned text (MuTo), a probabilistic model of text that is designed to analyze corpora composed of documents in two languages. From these documents, MuTo uses stochastic EM to simultaneously discover both a matching between the languages and multilingual latent topics. We demonstrate that MuTo is able to find shared topics on real-world multilingual corpora, successfully pairing related documents across languages. MuTo provides a new framework for creating multilingual topic models without needing carefully curated parallel corpora and allows applications built using the topic model formalism to be applied to a much wider class of corpora. Topic models are a powerful formalism for unsupervised analysis of corpora [1, 8].

artificial intelligence, computational linguistic, natural language, (15 more...)

arXiv.org Machine Learning

1205.2657

Country: